Bond, Francis, Timothy Baldwin, Richard Fothergill and Kiyotaka Uchimoto (2012) Japanese SemCor: A Sense-tagged Corpus of Japanese, In Proceedings of the 6th International Global Wordnet Conference (GWC 2012), Matsue, Japan
نویسندگان
چکیده
In this paper we describe the creation of the Japanese SemCor (JSEMCOR) sensetagged corpus of Japanese. The corpus is a translation of the English SEMCOR, with senses projected across from English. The final corpus consists of 14,169 sentences with 150,555 content words of which 58,265 are sense tagged. The corpus is one of the corpora used to provide sense frequency data for the Japanese Wordnet.
منابع مشابه
Development of the Japanese WordNet
After a long history of compilation of our own lexical resources, EDR Japanese/English Electronic Dictionary, and discussions with major players on development of various WordNets, Japanese National Institute of Information and Communications Technology started developing the Japanese WordNet in 2006 and will publicly release the first version, which includes both the synset in Japanese and the...
متن کاملEnhancing the Japanese WordNet
The Japanese WordNet currently has 51,000 synsets with Japanese entries. In this paper, we discuss three methods of extending it: increasing the cover, linking it to examples in corpora and linking it to other resources (SUMO and GoiTaikei). In addition, we outline our plans to make it more useful by adding Japanese definition sentences to each synset. Finally, we discuss how releasing the corp...
متن کامل"PolNet - Polish WordNet" project: PolNet 2.0 - a short description of the release
In December 2011/January 2012 we have released the main deliverable of the project "PolNet Polish WordNet". It was first presented and distributed (as PolNet 1.0) at the 5th Language and Technology Conference in Poznań (2011) and (informally, with kind permission of the organizers) distributed during the Global Wordnet Conference in Matsue, Japan, in January 2012. We intend to present to the pa...
متن کاملBoot-Strapping a WordNet Using Multiple Existing WordNets
In this paper we describe the construction of an illustrated Japanese Wordnet. We bootstrap the Wordnet using existing multiple existing wordnets in order to deal with the ambiguity inherent in translation. We illustrate it with pictures from the Open Clip Art Library.
متن کاملBaldwin, Timothy, Su Nam Kim, Francis Bond, Sanae Fujita, David Martinez and Takaaki Tanaka (2008) MRD-based Word Sense Disambiguation: Further Extending Lesk, In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India
This paper reconsiders the task of MRDbased word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact onWSD performance of different tokenisation schemes, scoring mechanisms, methods of gloss extension and filtering methods. In experimentation over the Lexeed Sensebank and the Japanese Senseval2 dictionary task, we demonstrate that character bigrams with sense-s...
متن کامل